Description of Assignment


So, I moved on to the next paper anticipating less problems. The paper Ramananjato et al. 2020 analyzes the role of two species of mouse lemur, Microcebus rufus and Microcebus jollyae, in seed dispersal in Madagascar’s forest. The researchers collected seeds which had been defecated by mouse lemurs and seeds which had fallen on the ground them subjected them to various growth experiments. For Microcebus rufus, one experiment looked at seed growth in a petri dish setting and the other looked at seed growth on the forest ground. For Microcebus jollyae, experiments were conducted in a petri dish, in a semi-shaded plot and a shaded plot. From these experiments, the researchers compared germination rate, germination time and seedling length after a certain period of time between defecated and control seeds using mixed effect models and survival analysis.



First load the packages. There are quite a lot of them for this analysis.
library(lme4)
library(ggplot2)
library(ggfortify)
library(Hmisc)
library(ggpubr)
library(MuMIn)
library(tidyverse)
library(survival)
Loading the data into R the first step of the analysis. This was more complicated than I intially realized because every numerical variable was actually loaded in as a factor When attempted to graph the results or run models, the results would have an error or not look as it should have. I used as.numeric to create new numeric varaibles in the data set which the R could read easily.
data <- read.csv("mouse_lemur_data.csv", header = TRUE)
data$species_number <- as.numeric(as.factor(data$scientificName))
data$seedling_mm <- as.numeric(data$seedling_length_mm)
Subsetting the data into something workable based on the experiment and then only the seeds which germinated. This is where we can begin to see that the data set I have is not the data set which is presented in the paper. The number of observations is incorrect in many cases.
First I will subset the data for seeds dispersed by Microcebus rufus
# Subsetting the data to only include seeds dispersed by microcebus rufus
rufus <- subset(data, disperser == "microcebus_rufus")
# Subsetting the data to only include seeds dispersed by microcebus rufus in the petri dish experiment
rufus_petri <- subset(rufus, experiment == "Petri dish")
nrow(rufus_petri)
## [1] 685
# The number of total observations in the experiment is 685
# Now, I am subsettung the data to only have seeds which germinated 
rufus_petri_yes <- subset(rufus_petri, germination_state == "yes")
nrow(rufus_petri_yes)
## [1] 121
# The number of germinated seeds is 121
# This creates summary statistics for seedling growth for each species
rufus_petri_yes_summary <- rufus_petri_yes %>% 
  group_by(scientificName, treatment) %>%
  summarise(seedling_mean = mean(seedling_mm),
            N = n())
## `summarise()` has grouped output by 'scientificName'. You can override using the `.groups` argument.
####  This line modifies the summary statistics to include only the species for which there is a defecated and control. This step was not explicitly stated anywhere in the paper and it took me a very long time to figure it out.   
rufus_petri_yes_summary_modified <- rufus_petri_yes_summary[-c(3, 4, 5, 6, 7, 12), ]
rufus_petri_yes_summary_modified
sum(rufus_petri_yes_summary_modified$N)
# 88


The number of total observations in the experiment is 685. The number of observations germinated is 121. The number of observations after only including species with defecated and control seeds is 88. However, the paper states 150 is total sample size and 88 observations are actually analyzed.

In this case, the researchers seemed to have used the final summary (n=88) for the number of observations actually analyzed (n=88). Concerningly, I do not know where the number 150 as the total number of observations comes from.


# Subsetting the data to only include seeds dispersed by microcebus rufus in the petri dish experiment
rufus_ground <- subset(rufus, experiment == "Forest ground")
nrow(rufus_ground)
## [1] 231
# The number of total observations in the experiment is 231
# Now, I am subsettung the data to only have seeds which germinated
rufus_ground_yes <- subset(rufus_ground, germination_state == "yes")
nrow(rufus_ground_yes)
## [1] 7
#The number of germinated seeds is 7
This experiment does not need a filtering for only species with defecated and control seeds because they are all the same species
The number of total observations in experiment is 231. The number of observations germinated is 7. The umber of observations after only including species with defecated and control seeds is 7 (they are all the same species). However, the paper states 75 is total sample size and 7 are actually analyzed.
The researchers seemed to have used the final summary (n=7) for the number of observations actually analyzed (n=7). Concerningly, I have no idea where the number 75 as the total number of observations comes from.


Now, I will subset the data for seeds dispersed by Microcebus jollyae
jollyae <- subset(data, disperser == "microcebus_jollyae")
jollyae_petri <- subset(jollyae, experiment == "Petri dish")
jollyae_closed <- subset(jollyae, experiment == "Closed")
jollyae_semi <- subset(jollyae, experiment == "Semi-closed")
jollyae_petri_yes <- subset(jollyae_petri, germination_state == "yes")
jollyae_closed_yes <- subset(jollyae_closed, germination_state == "yes")
jollyae_semi_yes <- subset(jollyae_semi, germination_state == "yes")
# This is the violin first graph of the rufus petri dish experiment

ggplot(data = rufus_petri_yes, aes(x = treatment, y = seedling_mm)) +
  geom_point() + theme_bw() + geom_violin( aes(colour = treatment, fill = treatment)) +
  stat_summary(fun = mean, geom="point", shape=23, size=5, fill = "black") +
  labs(title = "Microcebus rufus",subtitle = "Petri dish", 
       y = "mean seedling length (mm)", x = "Treatment") +
  theme(plot.title = element_text(face = "italic")) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  scale_color_manual(values=c("darkgreen", "darkorange3")) +
  scale_fill_manual(values=c("darkgreen", "darkorange3")) + 
  theme(legend.position = "None")

# This caption is to be added later using word (as done in the original study)
#  N = 37(75) 51(75) 
# This second graph is a series of three
# It is complete with axis labels, a title, subtitles, 
# I had to go through a lot of work to hide the y axis on two of them so that they would all 
# fit together without excess information


# First we have the 
plot1 <- ggplot(data = jollyae_petri_yes, aes(x = treatment, y = seedling_mm)) +
   theme_bw() + geom_violin( aes(colour = treatment, fill = treatment)) +
  stat_summary(fun = mean, geom="point", shape=23, size=5, fill = "black") + 
  ylim(0,50) + 
  labs(title = "Microcebus jollyae",subtitle = "Petri dish", 
       y = "mean seedling length (mm)", x = "Treatment") +
  theme(plot.title = element_text(face = "italic")) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  scale_color_manual(values=c("darkgreen", "darkorange3")) +
  scale_fill_manual(values=c("darkgreen", "darkorange3")) + 
  theme(legend.position = "None")
plot1

plot2 <- ggplot(data = jollyae_semi_yes, aes(x = treatment, y = seedling_mm)) +
   theme_bw() + geom_violin( aes(colour = treatment, fill = treatment)) +
  stat_summary(fun = mean, geom="point", shape=23, size=5, fill = "black") +
  ylim(0,50) +
  labs(title = "",subtitle = "Semi-shaded", 
       y = "mean seedling length (mm)", x = "Treatment") +
  theme(plot.title = element_text(face = "italic")) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  scale_color_manual(values=c("darkgreen", "darkorange3")) +
  scale_fill_manual(values=c("darkgreen", "darkorange3")) + 
  theme(legend.position = "None") + 
  theme(axis.text.y = element_blank(), 
        axis.ticks.y = element_blank(), 
        axis.title.y = element_blank())
plot2

plot3 <- ggplot(data = jollyae_closed_yes, aes(x = treatment, y = seedling_mm)) +
  theme_bw() + geom_violin( aes(colour = treatment, fill = treatment)) +
  stat_summary(fun = mean, geom="point", shape=23, size=5, fill = "black") +
  ylim(0,50) +
  labs(title = "",subtitle = "Shaded", 
       y = "mean seedling length (mm)", x = "Treatment") +
  theme(plot.title = element_text(face = "italic")) + 
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  scale_color_manual(values=c("darkgreen", "darkorange3")) +
  scale_fill_manual(values=c("darkgreen", "darkorange3")) + 
  theme(legend.position = "None") + 
  theme(axis.text.y = element_blank(), 
        axis.ticks.y = element_blank(), 
        axis.title.y = element_blank())
plot3

ggarrange(plot1, plot2, plot3, ncol = 3, nrow = 1)

# Just like the first graph, a caption is to be added later using word (as done in the original study)